Automatic Transliteration and Back-transliteration by Decision Tree Learning

نویسندگان

  • Byung-Ju Kang
  • Key-Sun Choi
چکیده

Automatic transliteration and back-transliteration across languages with drastically different alphabets and phonemes inventories such as English/Korean, English/Japanese, English/Arabic, English/Chinese, etc, have practical importance in machine translation, crosslingual information retrieval, and automatic bilingual dictionary compilation, etc. In this paper, a bi-directional and to some extent language independent methodology for English/Korean transliteration and back-transliteration is described. Our method is composed of character alignment and decision tree learning. We induce transliteration rules for each English alphabet and back-transliteration rules for each Korean alphabet. For the training of decision trees we need a large labeled examples of transliteration and backtransliteration. However this kind of resources are generally not available. Our character alignment algorithm is capable of highly accurately aligning English word and Korean transliteration in a desired way.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transliteration Using a Network of Phoneme Chunks

In this paper, we present methods of transliteration and back-transliteration. In Korean technical documents and web documents, many English words and Japanese words are transliterated into Korean words. These transliterated words are usually technical terms and proper nouns, so it is hard to find them in a dictionary. Therefore an automatic transliteration system is needed. Previous transliter...

متن کامل

A Chunk-based n-gram English to Thai Transliteration

In this study, a chunk-based n-gram model is proposed for English to Thai transliteration. The model is compared with three other models: Table lookup model, decision tree model, and statistical model. The chunk-based ngram model achieves 67% word accuracy, which is higher than the accuracy of other models. Performances of all models are slightly increased when an English grapheme to phoneme is...

متن کامل

Learning Transliteration Lexicons from the Web

This paper presents an adaptive learning framework for Phonetic Similarity Modeling (PSM) that supports the automatic construction of transliteration lexicons. The learning algorithm starts with minimum prior knowledge about machine transliteration, and acquires knowledge iteratively from the Web. We study the active learning and the unsupervised learning strategies that minimize human supervis...

متن کامل

English-to-Korean Transliteration using Multiple Unbounded Overlapping Phoneme Chunks

We present in this paper the method of English-to-Korean(E-K) transliteration and back-transliteration. In Korean technical documents, many English words are transliterated into Korean words in various forms in diverse ways. As English words and Korean transliterations are usually technical terms and proper nouns, it is hard to nd a transliteration and its variations in a dictionary. Therefore ...

متن کامل

Machine Learning Based English-to-Korean Transliteration Using Grapheme and Phoneme Information

Machine transliteration is an automatic method to generate characters or words in one alphabetical system for the corresponding characters in another alphabetical system. Machine transliteration can play an important role in natural language application such as information retrieval and machine translation, especially for handling proper nouns and technical terms. The previous works focus on ei...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000